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Abstract 

The expected length of longest common subsequences is a problem that 
has been in the literature for at least twenty five years. Determining the 
limiting constants 7^ appears to be quite difficult, and the current best 
bounds leave much room for improvement. Boutet de Monvel explores an 
independent version of the problem he calls the Bernoulli Matching model. 
He explores this problem and its relation to the longest common subsequence 
problem. This paper continues this pursuit by focusing on a simplification 
we term r-reach. For the string model, Lr(u, u) is the longest common 
subsequence of u and v given that each matched pair of letters is no more 
than r letters apart. 

1 Introduction 

In our technology oriented society fast processing of digital data is becoming 
increasingly important. String comparison is a kind of data processing that has 
applications in a wide range of fields including molecular biology, human speech 
recognition, computer spelling correction, and gas chromatography [4]. A robust, 
extensively studied, method for comparing two strings, u and v say, is to compute 
the length of one of their longest common subsequences (denote this length by 
L(u, v)). A subsequence of a string w is a string obtained by deleting some elements 
of u. For example, netra is a subsequence of cinematography. A longest common 
subsequence of two strings u and u is a subsequence of u and v of maximum 
length. For example, netra is an longest common subsequence of cinematography 
and neurotransmitter because there is no longer string that is a subsequence of 
both strings. 

1.1 The Random String model 

The following notation will be useful for working with strings: 
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Figure 1: Current best bounds and Monte Carlo approximations of 7fe. Lower 
bounds are from [3] and [1]. Upper bounds are from [3]. Approximations are 
from [2] and were computed using Monte Carlo simulations extrapolated to large 
n using = + ^^^^ + for real numbers A^, Cf. that don't depend 

on n. 



Definition. Define an alphabet E of size k to be {0, 1, . . . , A; — 1}. Let S" be the 

set of all sequences of length n on alphabet E. 

Definition. If u = U1U2 ■ ■ - Un and Uj e E, define u{i ■ ■ -j) to be the substring 

UiUi^l . . . Uj_ 

A very interesting and difficult problem is to compute the average length of 
longest common subsequences over all possible pairs of strings. Or more precisely, 
define 

An open problem is to compute the following limit: 

7fc = lim 



n — >oo Ji 



Klarner and Rivest established that EL„ is superadditive-EL„_|_m > EL„ + ELm- 
and from this it can be shown that the above limit exists (see e.g., [1]). 

The current best lower and upper bounds as well as Monte Carlo approxima- 
tions of 7fc are shown in Figure (1). 

Longest common subsequence computations can also be formulated as a dy- 
namic programming algorithm or as a directed time passage percolation model 
(see e.g. [3], [2]). In the directed time passage percolation model, we work with the 
two dimensional lattice in the first quadrant: vertices exist at the points for 
i,j G {0, 1, 2, ...}. On each vertex D^j will is an integer, and D^^o and Do,i 
are initialized to 0. Given two strings u and v, L{u,v) is computed by preserving 
Djj = L('u(l . . . i),v{l . . . j)). The necessary recurrence is 



J) . . = I Di_ij_i + 1 ii Su(i),vU) = 1 

''■^ ' max{'Dij-i,T)i-ij} if = 



Where is the Kronecker delta (the motivations for this notation will 

become clear in the next section). Another way of looking at this recurrence is 
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to make bonds between adjacent vertices in the lattice directed in the positive x 
and y directions. A diagonal bond from {i — l,j — 1) to is added if and only 
if Su(i)^v{j) = 1- If the horizontal and vertical bonds are given weight 0, and the 
diagonal bonds are given weight 1, L(u, w) is the weight of a maximum weight 
path from (0,0) to {\u\, \v\). 



1.2 The Bernoulli Matching model 

A related problem called the Bernoulli Matching model is named and well explored 
by Boutet dc Monvel in [2]. It is most readily seen as a modification of the 
directed time passage percolation model. Instead of placing diagonal bonds based 
on a match in a pair of strings, diagonal bonds are placed independently at each 
location with probability l/k. In the random string model, the probability of 
a bond between (i — 1, j — 1) and is l/k, but these probabilities are not 

independent. The recurrence for the Bernoulli Matching model is 



D, 



Di-i,j-i + 1 if eij = 1 

''•^ ~ \ moa:{Djj_i,Dj_ij} if Cij = 



where the e^j are independent random variables with Pr(ejj = 1) = 1/fc and 
Pr(ey = 0) = 1 - l/Zc. Let EL^^''^ be the expected value of D 

n,n given this model. 
EL^^'^^ like EL^'^^ is superadditive [2] and therefore the following limit exists: 

Ik = lim 



n 



Boutet de Monvel [2] has conjectured that 7^ = and gives a more 

general conjecture for the off diagonal lattice positions (Steele conjectured this 
for the Random String model in 1982, Boutet de Monvel refined it in 1999). He 
also presents a nice derivation of this result based on cavity methods typically used 
for the mean field theory of disordered systems, which he does not try to justify 
rigorously. Though not yet a proof, the method appears to solve the problem 
quite elegantly and agrees well with numerical approximations. 



1.3 The r- reach simplification 

A straight-forward way of obtaining a lower bound for EL^^'^-' is to only consider 
common subsequences that do not match letters "too far" from each other. This 
is equivalent to restricting the lattice to a diagonal band of fixed width with center 
line X = y. More precisely, let L-r(u, v) be the length of a common subsequence of 
u and V as long as possible given that if u{i) = v{j) are paired by the subsequence, 
then |i — j| < r. We will use R instead of D when working with r-reach. The 
recurrence is modified as follows (Ri,o, and Ro,i are initialized to as before): 

+1 if ^u{i).v{j), i^ij) = 1 

R. • = J niax{Rjj_i, Ri-ij} if (cy) = and \i - j\ < r 

''^ I R'i.j-i if '^«(i),uO),(eij) =0 andj-i>r 

, Ri-i,j if '^u(i),uO)> (eij) = andi-j>r 
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Let E'Ln^k,r, (^^n,k,rj ^e the expected value of R„,n given this model. Su- 

peradditivity still holds in this model (i.e. E'Ln,k,r + 'Elim.k.r < ^^(n+m).k.r) 
because a maximum weight path from (0, 0) to (ri + m,n + m) has weight at least 
as large as (weight of maximum weight path from (0,0) to (n, n))+ (weight of 
maximum weight path from (n, n) to {n + m,n-\-m)). The same argument applies 
to EL^^ ^. Now define 

,. EL„ r B 1- ^'^n.k.r 

7fc,r- = hm , = hm ^ 

n ^oo 71 ' n ^oo fi 

A simple but quite interesting fact is 
Claim 1 



lim 7f ^ = 7f and lim ^k,r = 7fe 



Proof, r-reach effectively reduces the allowable paths. It is easy to see that for 
fixed values oieij, D„ „ > Rn,n, and therefore 

EL^,,,, < EL^C^) =^ < =^ lim 7,^, < 

Next apply superadditivity and EL^j^ ^ = EL^^'^^ to show 

EL|^ ELj^ ELf(^) 
71. „ = hm ^ = . 

' n — ►oo n r r 

Taking the limit of both sides yields 

lim 7;f „ ^ lim = 7? 

r ►cxa "'^ r ^oo r 

The analogous result for the Random String model is proved the same way. ■ 



2 Solutions to Bernoulli Matching model r-reach 
for small r 

For small r, the percolation problem can be dissected in full detail. The approach 

\iscd is fairly straight-foward and computationally intensive. Unfortunately it 
appears that the r-reach problem is not as elegant as the original-possibly because 
of the "discontinuous" boundary effects at the displaced diagonals + r) and 
(i + r, i) . There are several reasons this problem is worth studying, however. First 
of all it gives lower bounds for the original problem. Also, it is an interesting 
setting to compare the Random String model with the Bernoulli Matching model. 
The methods outlined below seem very difficult to use to solve the problem for 
general r, however they provide foundations for numerical work on large r. 

The basic idea of the following analyses is to break the lattice into sections con- 
sisting of the 2r+ 1 vertices {n — r,n),{n — r + l,n),...{n,n),{n,n—l),...{n,n — r) 
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and then compute probabilities that R takes on specific values at these vertices. 
We only need to know the distribution of the n*'' section to compute the dis- 
tribution of the (n + 1)** section. More formally, let Pn{z) be the probability 
that R„,n = z. For notational convenience let xq = yo = z. For {n ^ r) let 
Rn{z,xi,yi,X2,y2, ■■■,Xr,yr) be the event that (R„-i,„ = a;j and R„,„-j = yi 
Vi e {0, 1, r}). Also define 

Pn{z,Xi,yi,X2,y2,--,Xr,yr) = PT{Rn{z, Xi, yi, X2, y2, ■-, Xr, yr))- 

Let Pn{z) be a row vector of length 2^*^ whose set of components is 
{Pniz,Xi,yi,X2,y2,-,Xr,yr) : Vi € {l,2,...,r}, 
Xi = Xi-i - di and j/j = yi-i - d\ for some , € {0, 1}}. 

The order of these components in the vector is not important; we will need to 
pick an order later to do matrix multiplication, but for now we will leave this 
unspecified. The values of R at adjacent lattice points can only differ by 1 or 
so the vector Pn{z) contains all possible values for vertices in the same section as 
(n, n). Thus 

2^'- 

Pn{z) = Y,PJ^)i = Pjz)\ 

i=l 

where 1 is the column vector (1, 1)'. 

Now we look at the relationship between P„(z) and Pn-i{z). Let Xq — y^ = z'. 

IfPn{z)j = Pn{z,x-i,yi,X2,y2,--,Xr,yr) and P„_i (2:')i = Pn-i{z' , x[, y[, x'2, y'2, ■-, x[., y'^) 
and z' G {z, z — 1} define 

Pr{Rn{z,xi,yi,X2,y2,--,Xr,yr) and Rn-i{z' ,x[,y[,x'2,y'2, ■■■,x'r,y'r)) = 

(1) 

It sufficed to define this only for z' = z or z — 1 because otherwise the probability 
is 0. Therefore summing over all possibilites for i?„_i() in the above expression 

gives us Pn{z)j-- 



^Pn-l{z)iMij+'^Pn-l{z - l)iNy =PT{Rn{z,Xi,yi,X2,y2,--,Xr,yr)) = Pn{z)j 

i=l i=l 

> 

Taking the convention that P„(z) is the zero vector for n < r, this yields the 
recurrence that is true for all n ^ r: 

K{z) = P„-i(z)M + P„_i(z-1)N {n ^ r) (2) 

Now we will construct some generating functions. The convention made above 
allows the generating function variables n and z to extend over all integers. We 
will work with the two different generating functions Hn{b) = P„(z)6^ and 

z 

G(a, b) = ^ P^)aPb\ 

n,z 



( Mij iiz' = z 
1 Nij if z' = z - 1 
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2.1 The generating function G(a, b) 

Multiplying (2) by a" 6^ and summing over all n ^ r and all z yields 



Add a^Hr{b) to both sides to obtain 



Since Pr-\{z) is the zero vector, this becomes 



G(a, 6) = aG(a, 6)M + a6G(a, 6)N + a''Hr{b). 

Then , , 

G(a,6)(I-aM-a6N) =a''i?^(6). (3) 



2.2 The generating function Hn{b) 

.We can also multiply (2) by and sum over all z to obtain 

5^6) = Hn-i{b)M+bHr,-i{b)N (n^r) 



This shows we can obtain Hn{b) by successive multiplications by M + &N; that 
is, let T(6) = M + 6N. 

H^) = HAb)T{br-' 

To obtain the behavior of T(6)"~'^ as n — > oc' wc assume from now on 6 ^ 0. 
We can then apply results about positive matrices (see e.g. [5]). Let dct(T(6) — 
AI) =g(A,fe), apolynomialin Aand6. g{X,b) = (A-/i(&))(A-/2(fo))...(A-/22.(6)). 
Let e{b) = (ei(6),...,er(&))' > be s.t. T{b)e{b) = e{b)fi{b) and lot e*{b) = 
(e^(6),...,e;(6)) > 0' s.t. e*{b)fi{b) = e*{b)T{b). Normalize e(&) and e*{b) so 
that e(6)l =l,e*(6)l =1. Applying results for positive matrices, 

^=e(6)e*(6)^ lim ^^^=0 (4) 

fi{b) n^oc n}i(b) 



n ^oo 



When 6=1, this becomes 

lim T(l)" = le*(l) (5) 
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since T(l) is the transition matrix between probability distributions il„_i(l) 



and -ffn(l)- 
Let hn{b) 



We need the following limit result to complete the 



analysis. It appears that it should follow from (4), but a proof eludes us. For 
now, we will assume it to complete the analysis. 



Claim 2 

The next step is 

dhnjb) 
db 

dhn{b) 



lim 

I — 'oo db 



1 d{T{br)ij {TibThdMb) 
Mbr+' db 



db 



nh{by db 

_ 1 rf(T(&)"),, 



b=l 



db 



6=1 



db 



6=1 



lim I'^^T^^)") 



*oo n db 



lim (T(l)") 



dfiib) 



6=1 



db 



=le*(l) 



dfiib) 



6=1 



db 



(6) 



6=1 



Where the last implication follows from the unproven claim and (5). Now we can 
apply this result to find EP„(2;) which is defined below 



db 

z 

Dividing by n and taking the limit of both sides yields 



6=1 



lim 



EP„(^) 



lim 



n — ^oo 77, 



+00 n db 



lim l^r-(6)T(6)"-) 



6=1 



lim iT(l)-^M) 
n — ^oo \n do 



n ^oo Ti 



1 



+ -Hr{l) 



db 

d{T{bn 



6=1 



6=1 



n 



db 



6=1, 



n — ►cxD \ n n do 



6=1, 



Hr{l) lim 

n ^■cso \ 77, 



1 d(T(6)"-0 



db 



i^r(l)le*(l) 



dfiib) 



6=1, 



db 



e*(l) 



dfiib) 



6=1 



d6 



6=1 



This last line uses (6) and = ^ -P„(5;)l = ^ -P„(2:) = 1. The equality 

z z 

above and the equation obtained by multiplying it by 1 are stated below; they 
will be useful later. 



lim 5^=e*(i)^ 



n ^oo ji 



FT ^ 

and lim = Ikr 



dfiib) 



6=1 



n ^oo fi 



db 



(7) 



6=1 



The following claim makes computing 



dh(b) \ 
db 1 6=1 



easier. 
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Claim 3 Letfiib) be the root ofg with Ml) = 1. Then ^l^, = -^^l^, (^) 
Proof. 

evaluating at 6 = 1 yields 

dfiib) 



d6 



6=1 



5(A,1) = (A-1)(A-/2(1))...(A-/22.(1)) so A- 1 divides 5(A,1). ^(A, 1) has only 

one root at A = 1 because this root corresponds to the eigenvector 1 of T(l); 1 
is the unique positive eigenvector of T(l) (see e.g. [5]). Thus g^xi) defined at 
A = 1 and equals 

1 



(1-/2(1))...(1-/2..(1)) 

from which the claim follows directly. ■ 



2.3 Detailed analysis of 1-reach 

> 

Whenr = 1, P„(z) = {Pn{z, z, z), Pn{z, z, z-l), Pn{z, z-1, z), Pn{z, z-1, z-1)). 
The matrices M and N are not difficult to compute by hand; they are 



Pn-l{z,Z,z) 
Pn-liz, Z,Z- I 
P„_l(2;, Z- 1,Z 
Pn-l{z,Z- 1,Z 

Pn-l{z - 1, Z - 1, Z - 1) 
P„_i(z- l,Z-2) 
P„_i(z-l,z-2,z-l) 
P„„i(z- l,z-2,z-2) 

The expressions to the left of each matrix label the rows according to the 
component order defined above; the columns correspond to P„(z, z, z), Pn{z, z,z — 
1), P„(z, z — 1, z), P„(z, z — 1, z — 1) in that order. We can also easily compute by 

hand ^^ = (^,0,0, 1). 

2.3.1 The two variable generating function 

(3) gives us 

G{a, b) = a ( 0, 0, ) (I - aM - abN)-\ 



) 



-1) 










-(fc-l)f 
(fc-1)^ 

— p — 



fc-1 

1 

k 





fc-1 



1 

k 











fc3 

fc-1 
fc-1 

T 

fc 



= M 



= N 
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G(a,6)' 



Solving this problem with the two variable generating function is computationally 
intensive, but it's nothing Maple can't handle. We obtain 

ak^{k-l){-k + ab) 
-a%{k-lfk 

-a%{k~\fk 
-ab{a^b'^ - abk'^ - abk + 

{aH^-aH'^ik'^ +2k)+a%{k^ -3k^+:ik-l)+ab{2k'^+k^)+aik^ -3k^+'ik^ -k)-k^) 

This potentially gives us the entire distribution of the two variables. The gener- 
ating function for the expected value of P„(z), EP„(z) = ^ zF„(z), is found by 

z 

differentiating with respect to b and then evaluating at 6 = 1. We restrict to the 
k = 2 case to make the expression simpler and more readable. 



^EP„(^)'a" 



Ba2(Q;'' - 70,2 j_ 14^-12) 



4a" 



a + 8) 



4a2(a3 - 4a2 - a + 8) 
8a{3a^ - Wa^ + 26a - 16) 



(a^ - la^ + 22a - 16)" 



Using Mathematica's Discrete Math Rsolve package and a little computation 
by hand, we get 



«-#+2-^"0(n) 
W"-W+2-'"0(n) 
^n__|_+2-2"0(n) 

~f„ I 72 , o-2n 



121 



n - 



1331 



0(n) 



where the 0{n) terms vary like ncos(n0). Summing these components gives us 
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32 
121 



-2n 



0{n). 



Mathematica can also solve the case for general k, but the expression is dif- 



ficult to pick apart because it's so long. To get the behavior of 



EL„ 



divide 



EL^ ,, j^a" by a and integrate with respect to a. This generating function has 

n 

the form 



EL ( 

= £lW ^ ijj/ In(0(a2)) + C4(k)arctanh(0(a)) 

n 1 — a 



where Ci(k) are functions only of k; the 0{a^) and 0(a) are quadratic and 
linear polynomials in a with coefficients a function of k. Inferring from the k = 2 
case, we guess that 



EL 



ci{k)n - C2{k) + 2-^"0(n)c5(fc). 



And Maple tells us that 



ci(fc) = 



(fc2 + 3fc+l) 



C2{k) = 



k{2k'^ + 3k + 2) 



+ 6fc3 + llfc2 -|-6A;+ 1) 
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2.3.2 The one variable generating function 



det(T(6)- AI)=5(A,6) = -^i-Xk + b)x 

{b^-b'^kX{k+2)+bX{k^+2k^X-3k'^+k'^X+3k-l)+X^k{P-Xk^-3k'^+3k-l) (8) 
By (3) 



dfiib) 



db 



dg{l,b) 



b=l 



,=1 C(A,1) 



1 



ik-lfi3k + 2] 



3k 



fcs^ 7 V(fc^ + 3fc + l)(fc-l)'V (fc2 + 3A: + l)' 

Next we compute e*(l) (using Maple even though it's not necessary) 

e*il) = N[k 1 1 i±^] 
Choose N so that e*(l)l =1 AT = . From (7) we have 



Hm ^=e*(l)^ 
ao 



rs >c» n 



k{3k + 2) 



6=1 



(F + 3A; + 1) 



Summing all the components gives us 



7m 



3k + 2 



(fc2 + 3A: + l) 



This does not give us as much asymptotic information as the two variable gener- 
ating function, but it is much less messy and allows us to easily see the limiting 
behavior of EP„(2;). 

It is interesting to compare this limiting behavior to the conjectured behavior 
for 7^. It is guessed that Vkj^ — > 2 as k — > oo, whereas kj^^ — > 3 as 
k — > oo. 



2.4 2 and 3 reach 

When r = 2, M and N are matrices of size 16 x 16. For the two variable gen- 
erating function approach, we will restrict to the case k = 2. Maple can solve 
for G{a,b); G{a,b)l is a polynomial in a and b with leading term a^^b^^ di- 



vided by a polynomial with leading term a^°6^°. As with 1-reach, we can find 




Eel." 



2,2'- 



da to obtain the limiting behavior of EL^2 2- The result 



is an expression about a page long that is very difficult to read. But it appears 
that most relevant parts of it to the asymptotic behavior are: 



a(l — a) 



+ 



152 



16872(1 -a), , 
+ „ ln(a - 1) 



2(1 -a) 197(1 -a) 38809(1 - a) 
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Prom which we conclude 



EL 



152 16872 
'197^ ~ 38809" 



This seems to be consistent with the Monte Carlo approximations, as will be seen 
later. 

Now for the one variable generating function approach. This can be solved for 
general k. g{X, b) is too large an expression to be of much worth written down 
here. The resulting expression for is surprisingly simple however. 



dfiib) 



db 



dg{l,b) 



b=l 



db 



,=1 C(A,1) 



-^{k + l)(5fc-^ + 20r + 15k + 2)(fc* + k'^ + 3k^ + k + l){k - l)'^(fc^ + Sk'^ + 5A;^ + 3A; + 1) ) x 



(A:4 + 3/c3 + 5A:2 + 3k ■ 



l)(fc-l)15(fc+l)(fc4^ 

5A;3 + 20fc2 + 15A; + 2 



/c3 + + fc + 1)(A;4 + 10fc3 + 20fc2 + lOA; + 1) 



7m- 



fc4 + 10fc3 + 20A;2 + lOfc + 1 

when k = 2, this gives which confirms part of the guess for EL^2,2 found by 

the two variable generating function approach. e*(l) is illustrated as follows: We 

reshape the vector into a matrix so that it is easier to read. The component of 

> 

e*(l) that corresponds to P„(z, z — d^^, z — df, z — df, z — df ) in P„(z) is represented 



df df 



by 







4 











2k 
k 
1 



2k 
k + A 
k + 2 

2(fc+l) 







t 

k 

k + 2 
k + 1 

2fc+l 
fe 



1 

2(fc+l) 
k 

2fc+l 
fe^+4fc+l 

fe2 
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To normalize e*(l), the above matrix must be multiplied by fc4-|-iofca+20fc^+iofc+i 
which finally gives 



2 k k 1 



EPn{z) ~ n 



+ 10A;3 + 20fc2 + lOfc + 1)2 



P{5k^ + 20P + 15k + 2) 




k k W 



The case r = 3, fc = 2 is also computable in a reasonable amount of time (it 
took Maple about a half an hour on a 1992 Mega Hertz Dell). The result is 



3 Applications to the Random String model 

The machinery developed for r-reach with the Bernoulli matching model can be 
applied to 1-reach with the Random String model when k = 2. For r > 1, it 
appears this same brute force conditional probability approach is so complicated 
as to be almost useless, r = 1 and k > 2 seems significantly more difficult than 
r = 1, k = 2, which is rather surprising. We get an interesting reduction for 
the k — ^ CclSGj clS will be seen shortly. The reason for pursuing this approach 
despite its appearance of being difficult to generalize, is that it may lead to a short 
proof of > 72,1, which may be generalizable. It has been conjectured that 

lim„ tooJ^Vk = lim„ ,oo"fkVk (actually SankofF and Mainvillc conjectured 

that lim„ — 

jkVk = 2 (see e.g. [3]) and Boutet de Monvel [2] conjectured that 

lim„ >oo7^V^ = 2). If 1-reach is solved for general k, it may provide some 

insights into this problem. 

3.1 Detailed analysis of 1-reach 

The reduction for the case A; = 2 is not difficult, but it requires a fair amount of 
notation to discuss. 

Definition. If eij is defined for \i — j| < r and 1 < i, j < n, eij is a string 
realizable configuration of weight w if etj = (5„(i),^(j) for w distinct (u, f)eE"a;E". 

It is easy to convince oneself of the following claim by doing a case by case 
analysis for n = 3. Such an analysis extends easily to general n. 

Claim 4 Let k = 2 and let eij he defined for \i — j\ < 1 and 1 < i, j < n. eij is a 
string realizable configuration of weight 2 if 




3376 



4279 



Vi e {1, ...,n}, ei_i,i_i + ei,i_i + ei_i,i + e^^i G {0, 2, 4} 



(9) 



and is a string realizable configuration of weight otherwise. 
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Proof, k = 2 means the alphabet, S, is {0, 1} so that 
X(u(i — l,i),v{i — l,i)) must be in the set 



'^u(i—l),v{i) 



Y = 



1 1 
1 1 



1 

1 





1 1 



1 
1 








1 
1 



1 1 





1 
1 



(10) 

This shows that if the condition in (9) fails, eij is a string realizable configuration 
of weight 0. 

For the other part of the claim we proceed by induction on n. The case n = 1 
can be seen by noting that each element of Y is equal to 2 of the 16 possibilities 
for X(u(i — 1, ?'),«(? — 1, 0)- Suppose n > 1 and the claim holds for n — 1. Let 
u,v,u',v' be the strings of length n—1 such that Vi,j G {1, ...,n— 1} and < 1, 

^n— l,n 



Su'{i),v'ij)- 



By hypothesis, Z„ 



is one of 



^n— l,n— 1 ^n,n—l 

the eight matrices belonging to Y. For each matrix in Y, we can choose u(n) and 



u{n) — j 
v{n) = 



for 


u' and 


v'. 




1 


1 







1 


1 


1 




1 






< 1, eij 



The 



l-ti(n-l), 
l-v{n-l) 











i 















1 







1 


1 







1 


1 


1 




1 


















1 















1 


l-u(n-l), 
v{n—l) 


1-v 


-1), 

(n-l) 


u{n- 
v{n 


-1), 
-1) 


1-u 
1-v 


(n-1), 
(n-1) 


l-«(n-l), 
v{n—l) 


u{'n 
1-v 


-1), 



v{n—l) 



This shows is a string realizable configuration of weight at least 2. The weight 
cannot exceed 2 because then e^- restricted to i,j G {!,..., n— 1} would have 
weight greater than 2. ■ 

This claim lets us count the probabilities Pn{z,xi,yi) much like we did for 

the Bernoulli Matching model. We define the analogous probability vector but 
we have to break Pn{z,Xi,yi) into two pieces: Pn{z,Xi,yi) = P°'^{z, Xi,yi) + 
Pfiz,xi,yi). 

P°"(z,a;i,yi) = Pr(i?„(2;, xi, yi) and e„„ = 1), P°^{z,Xi,yi) = Pr(i?„(z, xi, yi) and e 



0). 



The reason for this split is that we need to know e„_i^„ 

{R'n— l,n— 1 , R'n— 2,n— 1 , R-n— l,n— 2}^ affcctS j^R^i^ji, R^i—I^ti, R^^^yi 

tion of M and N was done by hand and was a little trickier than for the Bernoulli 
Matching model. 



1 to determine how 
1}. The computa- 



pf-liz 


z, 


z) 




"1/4 




















0" 


pf-l{z 


z, 


z-l) 




1/4 























pf-l{z 


z 


-l,z) 




1/4 























pf-li^^ 


z 


- l.z- 


1) 


1/2 























p^'uiz 


Z-, 


z) 




























PT-liz 


z, 


z-l) 




1/4 























P^-iiz 


z 






1/4 























P^-iiz 


z 


-l,z- 


1) 


1/2 
























M 
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l,z- 


1) 


"1/4 














1/4 


1/4 


" 




l,z- 


l,z- 


2) 





1/4 











1/4 





1/4 


Pf-i{z- 


1,0- 


2,0- 


1) 








1/4 











1/4 


1/4 


Pf-l{^~ 


l,z- 


2,z- 


2) 























1/2 


P^'Ui^- 


l,z- 


1,0- 


1) 





1/4 


1/4 





1/4 








1/4 


P^i-iiz- 


l,z- 


1,0- 


2) 





1/4 











1/4 





1/4 


Pn-l{^~ 


l,z- 


2,0- 


1) 








1/4 











1/4 


1/4 


Pn-li^- 


l,z- 


2,0- 


2) 























1/2 



N 



The two variable generating function approach determines the fine hmiting be- 
havior: 

a(a2 - 2a + 8) 



2,ia 



2(a2 -4a + 8)(a- 1)2 



Using Mathematica's Discrete Math Rsolve package and a httle computation by 
hand, we obtain 



EL„,2,i = 



-4+2-3"/20(l). 



where the 0(1) term varies like cos(n^). We will compare this result to numerical 
approximations. 

The one variable generating function produces 



det(T(6)-AI)=5(A,&) 



128 



A3(6-2A)(6-4A)(fo3+62(-8A+l)+26A(-l+10A)+4A2(-4A+l)) 



and 



dh{h) 



db 



6=1 



dg{l,b) 






db 


6=1 


C(A,1) 




^64\ 


7 




.15 J 


~ 10 



A=l 



e*(l) 



20 



[8 1 1 3 3 4]. 



We can also " blow up" the 1-reach Bernoulli Matching model, so that we work 

)■ > 

withP°"(0) andP°f'(0) even though we don't need to. The resulting matrices are 
included in the appendix. It is interesting to note that the matrices only differ in 
the two rows corresponding to P„_i(0 — 1, — 1, — 1). The result is 



5^(A,6) = ^A^(6- 



2X){b^ - 86^A + 6A(1 + 20A) + 2\^{l - 8A)) 



and this polynomial is the same as one obtained earlier (in (8)) except for the A^ 
term. Also, 

e^*(l) = -^[7 220122 6] which is more precise behavior than 
that determined by the 4x4 matrix method. 

It is unclear whether there is a more direct way to see that the difference in 
the matrices for the Random String model and the Bernoulli Matching model lead 
to the conclusion 72,1 < 7^1 
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4 Numerical Work 



We ran Monte Carlo simulations for fc = 2 and r = 1, 2. ..10, 15, 20, 25, 35, 40. 
10000 trials were computed up to n = 1000 for each r. To obtain behavior 
varying with n, approximations of EL^2r ™^ EL„^2,r for all n from 1 to 1000 
were computed for each trial. Ideally, we should have computed separate trials 
for each n, but these results appear to lead to good extrapolations to large n. 
Following the work in [2], we extrapolate to from the small n simulations 
based on; 



^2, 



(11) 



Where Ar [A^) is a constant, and was found by minimizing the variance of 



EL„ 



). Extrapolations for 7^^, A^, and 72,7-, A^ based on 



Monte Carlo simulations are shown below. We did this extrapolation from n = 
50. ..1000 to minimize the effect of the 2^^"0(n) term (we only saw this for r = 1, 
but there are probably similar terms for larger r). 



r 




1 


2 


3 


4 


5 


6 


7 


8 


9 






0.72726 


0.77166 


0.78898 


0.79813 


0.80396 


0.80796 


0.81119 


0.81284 


0.81458 






0.2771 


0.4626 


0.5641 


0.6852 


0.8033 


0.9399 


0.9931 


1.0814 


1.1900 






0.70014 


0.73767 


0.75610 


0.76718 


0.77467 


0.78004 


0.78408 


0.78726 


0.78976 






0.2652 


0.4335 


0.5748 


0.7048 


0.8195 


0.9218 


1.0163 


1.1121 


1.2044 


r 




10 


15 


20 


25 


30 


35 


40 






, B 
!2.r 


().81')92 


0.81994 


().821<S2 


0.82290 


0.82355 


0. 82406 


0.82415 








1.2653 


1.5253 


1.6814 


1.7536 


1.8058 


1.8368 


1.8395 






l2,r 


0.79180 


0.79819 


0.80149 


0.80340 


0.80462 


0.80546 


0.80603 








1.2877 


1.6377 


1.8753 


2.028 


2.1273 


2.1939 


2.2371 







„, . „ /„s MonteCarlo(El,^ „ ) , MonteCarloCEL.!^ „ ^)+Ar , 

Shown m fagure (2) are ' ' and — ^^^^^ and the 

corresponding Random String model data is shown in (3). It appears that the ap- 
proximation EL„ 2.r ^ 72, — Ar gcts increasingly worse for larger r and likewise 
for the Bernoulli Matching model. 

For the Bernoulli Matching model we also can compute EL^jr exactly for 

small n by applying (2) directly beginning with Pr{z). This allows us to do two 
checks on the quality of the Monte Carlo approximations. Firstly, we can observe 

MonteCarlo{El,^2.r) ^^n,: 



the difference 



The statistic 



1000 



/ MonteCarlo{EL^^2,r) 



EL 



'n,2,r 



gives us an idea of how crude an approximation we get with 10000 trials. Also, 
we can see how good the approximation EL^j r ~ 7^r'^ ~ by using that 
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on the exact values of EL^j r extrapolate (for this extrapolation we use 
n= 1...2000). 



r 




Monte Carlo 7^^ 


72^^ from ELf 2,r- 


B from fractions 
'^2,r derived previously 


1 


5.2994 X 10"** 


0.7272634 


0.7272727273 


0.7272727272 


2 


5.0758 X 10-» 


0.7716676 


0.7715736043 


0.7715736040 


3 


1.0180 X 10-« 


0.7889874 


0.7889693851 


0.7889693853 


4 


1.5954 X 10-« 


0.7981354 


0.7982222051 




r 


Monte Carlo 


from EL^_2 


from fractions 
r r derived previously 


1 


0.2771 


0.264463 


0.2644628 




2 


0.4626 


0.434745 


0.4347445 




3 


0.5641 


0.574312 






4 


0.6852 


0.696534 







We also note that MonteCarlo{'y2,i) = 0.7001417 compared to 72,1 = .7 and 
MonteCarlo{A2^i) = 0.2652 compared to ^2,1 = .28 

5 Conclusions and future work 

It is hoped that the results presented in this paper lead the way to more significant 
results. In particular, it is hoped that the Random String model analysis may 
lead to a short proof of 72,1 < ^21- The hmiting behavior of 7^,1 —7^1 would also 
be of interest. We seek a conjecture for the quantities 7^^, though it is unclear if 

trying to determine 7^ via lim^. >oo Ikr = 7^ =^ good idea. 

The pseudoproof of 7^ = Y^^Tf given by Boutct de Monvcl may provide a 
way to simplify the r-reach computations. The limiting behavior of r-reach may 
be describable only by differences between adjacent values of R, thereby reducing 
the " problemsize" from 2^^ to 2r. Preliminary investigations suggest that this 
reduction may be possible but not as straight forward as the argument in the 
pseudoproof. 

6 Appendix 

The expanded version of the Bernoulli Matching model r = l,fc = 2 case has 
matrices as follows. These are given for comparison with the matrices for the 
Random String model r = 1, fc = 2 case. 
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poff ^ ^ 


) 




1/8 


U L) 


U 


U U 


U 


0' 








poff ^ ^ 


- 1) 




1 //I 
1/4 


U U 


u 


U U 


L) 











poff ( 
r^_-Y\z, z 


l,z) 




1 1 A 

1/4 


U U 


u 


U U 


U 











poff ( ^ ^ 


i, z - 






U U 


U 


U U 


U 







M 




pon y y 
^n-l\^i ^) ^ 






1/8 


U U 


u 


U u 


U 









pon ( 

^ n— 1 ^1 ^ 






1 /4 

i/4 


u u 


u 


u u 


n 

u 













l,z) 




1/4 

























l,z- 


1) 


1/2 























— l,z — 1,Z 


-1) 


"1/8 


1/8 


1/8 





1/8 


1/8 


1/8 


1/8" 


Pn- li^ 


— 1, z — 1, z 


-2) 





1/4 











1/4 







1/4 


P^'Uiz 


— 1, z — 2, z 


-1) 








1/4 











1/4 


1/4 


Pf-li^ 


-1,0-2,0 


-2) 

























1/2 


Pn-li^ 


— 1, z — 1, z 


~1) 


1/8 


1/8 


1/8 





1/8 


1/8 


1/8 


1/8 


P^'Ui^ 


— 1, z — 1, z 


-2) 





1/4 











1/4 







1/4 


PT-x{^ 


- l,z- 2,z 


-1) 








1/4 











1/4 


1/4 


pr-iiz 


- l,z- 2,z 


-2) 

























1/2 
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